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Abstract 

Large kernel systems are prone to be ill-conditioned. Pivoted Cholesky decompo¬ 
sition (PCD) render a stable and efficient solution to the systems without a pertur¬ 
bation of regularization. This paper proposes a new PCD algorithm by tuning Cross 
Approximation (CA) algorithm to kernel matrices which merges the merits of PCD 
and CA, and proves as well as numerically exemplifies that it solves large kernel sys¬ 
tems two-order more efficiently than those resorts to regularization. As a by-product, 
a diagonal-pivoted CA technique is also shown efficient in eigen-decomposition of 
large covariance matrices in an uncertainty quantification problem. 


1 Introduction 

Interpolation of spatial data with kernel functions m finds wide application in many 
fields of scientific computing. The problem is prone to be ill-conditioned when a large 
number of data or smooth kernel functions are used in hope of increasing accuracy, since 
smooth kernels have fast decaying Fourier series as their eigenvalues. Nevertheless a 
good interpolation accuracy is always accompanied by a bad condition number of the 
interpolation matrix[26j. This is like a high-wire walk between two abysses, bad accuracy 
on the left and numerical instability on the right, no wonder various regularization tech¬ 
niques 12a are usually adopted as safety ropes. In this paper we show pivoted Cholesky 
decomposition, a matrix approximation technique, is a much more efficient solver in this 
situation which guarantees the stability without the perturbation of regularization while 
reduces the complexity of solution from 0{v?) to 0(k 2 n) for a rank-fc system. And 
we propose an improved pivoted Cholesky decomposition algorithm by tuning Cross 
Approximation technique to symmetric and positive definite matrices. 

Pivoted Cholesky decomposition (PCD)® ED eg KZI and Cross Approximation mm 
are two favorable low rank approximation techniques with linear complexity in n (as a 
contrast to the cubic complexity of a reduced singular value decomposition). Compar¬ 
ing to other linearly complex techniques like Nystrom approximation mm and Sparse 
greedy approximations [2p] they are more accurate due to the pivot maximizations [lj. 
Another merit of them is they provide deterministic error bound thus can be run adap¬ 
tively to a certain accuracy. 
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PCD is a pivoted version of the outer product Cholesky algorithm m Algorithm 
4.2.2] that chooses the diagonal entry with the largest modulus as the pivot. In each 
iteration a rank-1 approximation is made based on the column and row cross at the 
pivot so that k iterations accumulates a rank-A; approximation. Earlier versions of this 
algorithm, e.g. LINPACK routine SCHDC [[8], include a global updating in which the 
rank-1 approximation is subtracted from the remainder matrix (or the original matrix for 
the first iteration), this made them more expensive than the later ones in HD El El EE EQ 
in which the updating only occurs in the pivoted columns. 

When Cross Approximation (CA) is applied to symmetric positive definite matrices it 
can be similar to a PCD, i.e. its result is only a row permutation away from a triangular 
matrix. If pivot is chosen only in the diagonal the fully pivoted CA and the early version 
of PCD are nearly identical with the only difference in the permutation. While choosing 
pivot in the diagonal in PCD is justified in [3j as maximizing the lower bound of gain in 
each iteration, it is not yet justified in the language of CA. The CA algorithm, with the 
merit of being simpler (without row swapping and nested entry indexing) and providing 
sharp error bound in uniform norm, can be adapted to a PCD if this gap is bridged. 

In this work we first bridge this gap and merge the merits of the two regimes into a 
new PCD algorithm which is simpler and gives sharp uniform-norm error bound, and then 
justify the validity and efficiency advantage of the algorithm in solving ill-conditioned 
kernel systems. As a by-product a fast approximation of matrix determinant is also 
given. Applications of the algorithm are given in solving ill-conditioned radial basis 
function (RBF) systems. 

The rest of this paper is organized as follows. In Section [2] we recall the basic 
Cross Approximation algorithm and introduce an adaption to symmetric positive definite 
matrices. In Section [3] we extend the adapted CA to PCD. In Section [4] we justify 
the PCD solution to rank deficient kernel systems. Section [5] gives applications of the 
algorithms. Section |6] summarizes the whole paper. 

2 Cross Approximation 

Cross Approximation (CA) is an iterative matrix approximation technique [U |5J that 
yields a result equivalent to a skeleton decomposition Lemma 3]. Like in many other 
methods, CA constructs a rank-A; approximation by using only a small portion, i.e. some 
k rows and k columns, of the matrix. Hence widely used for data compression [221 EE]. 
m suggests to choose the columns and rows so that their intersection has the largest 
determinant in modulus among all k x k submatrices (the maximal-volume principle ), 
and m shows such an approximation is quasi-optimal in uniform norm. 

In each iteration of CA, a rank-1 approximation is made and subtracted from the 
remainder matrix. It turns out that choosing the pivot (the intersection of the chosen 
column and row that form the rank-1 approximation) as the entry with the largest 

1 In [2j |3] this method was introduced in the name incomplete Cholesky decomposition 
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modulus in the remainder matrix is the optimal strategy in term of maximal-volume 
principle if we keep the pivot points in the previous iterations fixed (like in CA) [U 
Lemma 2] . 

For very large matrices, CA with global pivot maximization and global updating 
(which is termed as fully pivoted CA) is expensive or even prohibitive. A trade-off of 
accuracy and cost is made in the so-called partially pivoted CA which only searches for 
the maximum in some rows and columns. According to (U Lemma 2] this would be less 
preferable in term of maximal-volume principle, but it runs much faster. 

However, if the underlying matrix is symmetric positive semi-definite (SPSD) much 
of the cost of a fully pivoted CA can be saved without sacrificing accuracy. This is due 
to that the diagonal entries of SPSD matrices always include the maximum in modulus, 
and that the remainder matrices can be kept SPSD during the CA iterations. 

A SPSD matrix-adapted CA is proposed in this section. This algorithm has a com¬ 
plexity linear in the matrix size as does the partially pivoted CA, while retains the 
accuracy of a fully pivoted CA. We start from introducing the “baseline” - fully pivoted 
CA algorithm. 

2.1 Notation 

Matlab-like notation is used, i.e. a matrix A’s i-th column and j-th row are written 
as A[:,z] and A\j, :] respectively, but for an identity matrix these are written as and 
ej. We use the same notations for different sizes wherever the actual size is clear by the 
context. 

2.2 The basic algorithm 

CA algorithms produce a factorization AB approximating a matrix AT, i.e. AB ~ AT. 
Fully pivoted CA (as detailed in Algorithm |T]) is the most basic, accurate and also the 
most expensive one since it makes global pivot maximization and global updation of 
remainder matrix. It yields a a factorization AB with a user specified rank fc or a 
specified maximum entry-wise error e comparing to AT. 

The following Lemma shows the AB equals to a pseudo-skeleton decomposition [15J 
of AT. 

Lemma 1. /£, Lemma 4-6] Given matrices A and B as obtained in Algorithm^ approx¬ 
imating AT, /3 = {i £ } k i=1 and j — collect the pivots indices ii and j# in Algorithm 

[7J then 

AB = M[:,j]-M[{3,j]- 1 -M[f3,:} (1) 

The determinant of the submatrix AT[/3, j] can be computed conveniently by 

k 

det(M[/3,j]) = (2) 

e= 1 
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Algorithm 1 Fully pivoted Cross Approximation 

Require: M G W nxn , k E Z, 1 < k < min (m, n ) (or e to i € M for an adaptive version) 

Ensure: A E £? E e G ffi 

1: R:=M, £:= 1 

2: while t? < k and e > 0 (or while e > e to i ) do 
3: := argmaXj j |Rr[i, j] \ 

4: := , e = 7r 

5: A[:,4 :=#*[:, j*] 

6: :] := Re[ie, :]/7^ 

7: -R/:+i := Rt ~ AB 

8: £:=£+l 

9: end while 


with 7 ^ as defined in Algorithm [I] 

The maximal-volume principle 53] suggests the optimal pivoting strategy is to choose 
/3 and j so that the determinant of M[{3,j] is maximal in modulus among all k x k 
submatrices. So by <§ maximizing 7 ^ is a “greedy” strategy which maximizes the gain 
in each iteration. 

This fully pivoted CA algorithm is of complexity 0{kmnn) which is quadratic in 
matrix size. A linear complexity 0{k 2 (m + n)) can be achieved by the partially pivoted 
CA which searches only submaximal y^’s in the pivoted row and column in the previous 
iteration. This generally would sacrifice the accuracy of approximation. 

2.3 Tuned to symmetric positive semi-definite matrices 

However, a linear complexity can also be achieved without sacrificing accuracy in ap¬ 
proximating symmetric positive semi-definite (SPSD) matrices. This is due to the fact 
that a SPSD matrix always have a maximum in modulus in the diagonal, and that the 
remainder matrix R can be kept SPSD during the CA iterations, as justified below. 

Lemma 2 . If matrix M is SPSD, then 

(i) its diagonal entries are non-negative. 

(ii) its diagonal entries include a global maximum in modulus. 

Proof. (i) This follows the fact ejM = M [z, i\ > 0 for i = 1, 2, • • • , n. 

(ii) From 521 P-147], we have for any valid index i and j 

(e* ± ej) T M(ei ± ej) = M[i,i\ + M[j,j} ± 2 M[i,j] > 0 

in which we see for any off-diagonal entry there is an diagonal entry not 

smaller than it in modulus. 

□ 
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Lemma 3. f2l 1 Theorem 1.2.51 123 , p-40] If all principal minors of a symmetric matrix 
are nonnegative, the matrix is positive semi-definite. 

We prove the positive semi-definiteness of the remainder matrix R as following. 

Proposition 1. In approximating a SPSD matrix M, if CA chooses the pivot in the 
diagonal, the remainder matrix remains SPSD. 


Proof. Suppose the diagonal entry p is used as the pivot, and X,Y,Z and u,v are 
submatrices and vectors in M. The remainder matrix R after the first iteration is 


R = M — AA t = 


(X v Z T \ 

v T p u T 

\Z u Y 


1 

P 


( v\ 
P 

w 


(v T p 1i T ) 


(X — I vv T 0 Z T — Ivu T \ 


p 

0 T 0 

\Z — -uv T 0 Y — -uu 1 J 

\ P V / 


p 

0 T 

- 7/7/ T 

P 


(3) 


Because M is SPSD, so is its principle submatrix 


X u 

P/ 


and we have the relation 


'X n' 
n T P/ 


A y 

. 0 T 1 , 


'X-lvv T O' 
, ° T Pj 


p 0 


:= P t QP 


\P 


in which the invertibility of P implies that Q is also SPSD. Hence X — Ivv T and all 
its principal submatrices are nonnegative m Corollary 14.2.12]. And similarly this also 
holds for Y — Iuu T and all its principal submatrices. 

From Equation ^ it is clear that all principal submatrices of R that are not contained 
in X — Ivv T or Y—Iuu T (including R itself) have at least one zero column and one zero 
row thus have zero determinants. Therefore all principal minors of R are nonnegative, 
so its positive semidefinitedness follows Lemma [2j and its symmetricity comes from the 
symmetricity of AA r . Applying this rule for every iterations proves the proposition. □ 


By Lemma [2] and Proposition [I] we see in approximating SPSD matrices it suffices for 
CA to do the pivot maximization only in the diagonal, this leads to a CA algorithm that 
achieves a linear complexity 0(k 2 n) as in the partially pivoted CA but is as accurate as 
the fully pivoted one. Like in the partially pivoted CA in this new algorithm we avoid 
generating and updating the whole matrix, also save the storage for it. The algorithms 
yields a rank-A; approximation in the form AA r ~ M and the maximum entry-wise 
error e = || M - AA t | |oo, as detailed in Algorithm^ Accuracy of the approximation 
can be controlled by using an adaptive version that terminates as e drops below a given 
threshold 6 to i- 

Remark 1: The maximal entry-wise error of the algorithm as in Proposition [l] is just 
the largest diagonal entry (in modulus) of the remainder matrix R , since R is the error 
matrix and SPSD. 
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Algorithm 2 Diagonal pivoted Cross Approximation 

Require: M G M nXri (or a function j^{%) yielding the z-th column of Af), 
fc G Z, 1 < fc < n (or e to i G M for an adaptive version) 

Ensure: A G M nx/c , £ G R, (3 G 

1: Initialize empty A G M nxfc , , /3 G 
2: d := diag(Af), £ 1 

3: £ := max^ d[i\ 

4: while t < k and £ > 0 (or while £ > etoi ) do 

5: := argmax^ |d[z]| 

6: 7^ = d[i*] 

7: A[:,£] := 

(or replace Af[:,ig] by 
8 : d = d -( A [:,£]) 2 

9: £ := |7*| 

10: /3[£] := 

11: £:=£+l 

12: end while 


Remark 2 : Since the algorithm only works with one single column of Af in each itera¬ 
tion, it does not require the whole matrix to be generated. This is especially beneficial 
for very large matrices that are beyond memory capacity whereon the algorithm can 
take a function that returns only the z-th column as input. 

Algorithm [2] produces the same result as would a fully pivoted CA(Algorithm [l]), since 
it just make the same global maximization in a more efficient way by taking advantage 
of SPSD properties and accordingly only updates the relevant entries. E.g. though here 
only the diagonal of R is updated(as in step 8) in each iteration, once a pivot column 
is chosen its “owed” updation is redeemed(as in step 7). So the error bound mentioned 
in Remark 1 also holds for Algorithm [2] and the maximum entry-wise error is just the 
maximal of d. 

Notice that selecting pivot as the maximum in modulus in the diagonal is also proved 
optimal in [3j in term of maximizing lower bound of gain in each rank-1 approximation 
and in m in term of minimizing trace norm error. 

3 Pivoted Cholesky approximations 

In Equation ([3]) we see in each iteration the CA algorithm leaves one additional zero 
column and zero row in the remainder matrix, so that the j -th column of A has one 
more zero entry than the (j — l)-th, hence A is just a row permutation away from a 
triangular matrix. Appending a row permutation to Algorithm [2] leads to a pivoted 
Cholesky decomposition, M « LL T with L a n-by-k lower triangular matrix and M a 
symmetric permutation of M based on an index p yielded by the algorithm, as detailed in 
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Algorithm [3] and diagrammed in the upper part of Figure [l] The algorithm also produces 
a k-by-k triangular matrix L* that exactly reproduces the submatrix AT* which is the 
cross of the pivoted rows and columns. The reduced factorization AT* = L*Lj is useful 
in solving rank deficient systems as explained in Section |4j 

Though with the same complexity 0(fc 2 n), this algorithm is simpler than other PCD 
algorithms in pH p.255], [a p.20] and p3] since it involves neither row and column 
permutations in the iterations nor nested entry indexing which deteriorates computing 
performance. Another difference is that it gives maximal entry-wise error e — || M — 
LL t |loo in contrast to the bound of error in the sum of eigenvalues given by other PCD 
algorithms . 

Algorithm 3 Low rank pivoted Cholesky decomposition based on CA 
Require: AT G R nXn , k G Z, 1 < k < n (or e to i G R for an adaptive version) 

Ensure: L G R nxfc , T* G M /cx/c , e G R, p G 

1: Run Algorithm [ 2 ] to obtain A G R nxfc , e G R and (3 G 
2: j := {1, 2, • • • , n} \ (3 # complement of (3 

3: p := \(3 j} # concatenation of (3 and j , permutation index 

4: L := A[p, :] # rearrange rows according to p 

5: L* := A[/3, :] # rearrange rows according to f3 


The convergence of PCD is proved to be exponentially fast in k if the function 
underlies matrix M has exponentially decaying eigenvalues m Theorem 2]. This class 
of function is not rare, e.g. analytical functions is one of them [28]. 



Figure 1: Algorithm [ 3 J Unshaded zones depict zero entries. Upper: M ~ LL T . Lower: 
AT* — L*L~J. 


N 



Figure 2: Algorithm [ 4 J M ~ L n L^. Unshaded zones depict zero entries. 
Accuracy of Algorithm [3] can be improved by a slight variation with minor extra cost. 
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The variant yields a full rank factorization, AT ~ L n L ^ with L n n-by-n matrix, by filling 
the empty diagonal entries of L with the square root of the non-pivoted diagonal entries 
of R at the end of the procedure. This is detailed in Algorithm [4] and diagrammed in 
Figure [2} 


Algorithm 4 Full rank pivoted Cholesky decomposition 
Require: AT G R nXri , k G Z, 1 < k < n (or etoi G R) 
Ensure: L n G R nxn , £ G R, p G Z 71 


1 : 

2 : 

3: 

4: 

5 : 

6 : 


Run Algorithm [2] with a slight variation that initializes A as an n-by-n matrix to 
obtain A G R nxn , d G R n , £ G R, /3 G Z fc and £ G Z 
j := {1, 2, • • • , n} \ /3 # complement of (3 in {1, 2, • • • , n} 

<7 := {£, £+!,••• , n\ 


A\j,q\ := 

P := [P j] 
-t'n := : ] 


ff fill the unspecified diagonal entries 
ff concatenation of /3 and j 
# rearrange rows according to p 


Apart from the exact reproducing of the pivot columns and rows as do all CA and 
PCD algorithms^ Lemma 4.5], Algorithms [4] additionally exactly reproduces the di¬ 
agonal entries of the original matrix (this can be easily seen by the all-zero diagonal of 
R — M — L n Ln) while the off-diagonal part remains the same as that from Algorithm 3 
£ in Algorithm [4] is a loose bound, rather than the exact maximum as in Algorithm 3, 
of entry-wise error. 

Algorithm [4] also provides a fast approximation to determinant of AT, 

n 

det(M) « det {L n Ll) = ]J(L n [i , i}) 2 . (4) 

2=1 

If the matrix is ill-conditioned the product in Q would easily underflow, the determinant 
should be stored in logarithm which is the sum of logarithm of the factors. In some 
scenarios it is useful to quantify very small determinants (well below machine zero) rather 
than setting them to zeros, for example in maximum likelihood training of Kriging this 
gives the optimizer more information about the “trend”. 

This determinant approximation is much faster than the 0(fc 2 n)-complex one based 
on matrix determinant lemma [19, p.416] that uses n-by-k matrix X, 

det (A I + LL t ) = \ n ~ k det (A I k + L T L) 
which is perturbed by the “nugget” A. 

In some cases, e.g. solving a linear system Mx = /, we need an invertible factor¬ 
ization, this of course can be furnished by Algorithm [ 4 ] if the diagonal filling \fd\j] is 
large enough (which is often not the case). Another more efficient approach is to use 
only the submatrix AT* = L*Lj and solve a reduced system. In case of rank deficiency 
this leads to a solution of the original system at only an 0(k 2 ) cost in addition to the 
cost of PCD, as detailed in the next section. 










4 To solve large kernel systems 


Large kernel matrices are prone to be ill-conditioned or rank-deficient, and it is also well- 
known that a well-performing kernel system usually associates with an ill-conditioned 
matrix. In fact there is a rule in this field reminiscent of the Uncertainty Principle in 
quantum mechanics, stating that the condition number and the accuracy cannot be both 
good [26] . Regularization is usually the cure in this situation. But we would show here 
PCD can solve these systems more efficiently. 

Let’s take radial basis function (RBF) as an example. Consider modeling a function 
/ : D M on some compact domain D C by linear combination of radial basis 
function <p : R + —>> M each centered at one of the points in a scattered and distinct set 
X = {ajW,... , x^} C fl: 

n 

f(x) re s(x) = with = 0(11* - II) (5) 

2=1 

|| • || denotes Euclidean norm. The coefficient w is uniquely determined by fitting s(x) 
to n samples of / at X, i.e. 

f (6) 

$ij = M x ^) and fi = /0 W )> as long as the matrix <l> is positive definite which can 
be furnished by a proper choice of <p. 

But a large condition number can render numerically singular. This happens often 
when we have a larger n and/or use smoother RBFs (all in the hope to increase accuracy). 
The reason behind this is more obvious in the Fourier domain. Smooth RBFs have fast 
decaying Fourier series, so that all <pi can be approximated to a certain precision e by k 
Fourier series which have their modulus larger than a threshold related to e. Therefore 
when n> k, fa with i = 1 ,..., n cannot be mutually linear-independent to that precision. 
Smoother RBF’s have smaller k. For example, a Gaussian RBF’s Fourier coefficient is 
also a Gaussian function, i.e. it decays square-exponentially, that’s the reason it often 
results in singular matrices even with a moderate n. We also experimentally observed in 
approximating kernel matrices with PCD that as n is larger than a certain threshold the 
k that associates with a certain accuracy would stabilize at a fixed value (which makes 
the complexity 0(k 2 n) linear in n). 

So it is reasonable to use only k instead of n <pi in the approximation © • The 
PCD in Algorithm [3] is the choice for this purpose. Taking e to i as the machine precision 
it picks out the k “bases” (indexed by /3, one of its outputs) so that we can solve for 
the corresponding k coefficients in ^ by a much reduced system using only a [/3,/3]- 
indexed submatrix <l>*: 

L^Lj w* — /* (7) 

with /* the /3-indexed subset of /, and leave the n — k remaining coefficients zero. This 
leads to a solution w — [iu*, 0] T to the original system ©>• This triangular system costs 
only 0{k 2 ) flops to solve. 
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If the original system ©> is consistent, it has a solution despite the rank-deficiency, 
which is w. This can be seen in the following. Let’s first identify some null-space vectors 
of $ by using these block-form definitions 



T 

Z T " 


1 — 

b 

N 

1— 1 

1 * 

H 0 H 

1 

1_ 

# = 

z 

Y 

and N = 

I 


where $ denote the symmetrically p-permuted <h. Notice that 


<E7V = 


0 

Ry 


£ toi~frO 


0 

0 


with R y = Y -Z^~ 1 Z T . 


( 8 ) 


This vanishment is due to Lemma[l]by which the Z<f>~ 1 Z T equals to the lower right block 
of the approximation LL T , therefore Ry is just the lower right block of the remainder 
matrix with \\Ry ||oo = ttol- By ([§]) with a vanishing e to i AT is a group of null-space vector 
of <J>. While the inverse of in block form is □ I2Q1 P-25, Eq. 0.8.5.60 


1 

’$* 

Z T " 

-1 

1 - 

O 

1— 1 

1 * 

■a 

1 _ 


z 

Y 


0 0 


+ NRy 1 N T , 


(9) 


The second term in the right side is the culprit for the instability, as e to i vanishes, in 
case of rank-deficiency it overflows (and in case of full-rank the term would be null). 
It doesn’t harm to drop this term since in this situation it represents the null space 
contribution to the solution. The solution by using only the first term in the right side 
of ([9]) is just w. 

This is analogical to the scenario we solve by inverting a singular value decomposition 
(SVD) of $ with the reciprocals of very small eigenvalues replaced by zeros, which 
produces the solution with the smallest L 2 norm to the underdetermined system [25, P- 
69]. Comparing to that, the PCD solution has a smaller complexity of 0(k 2 n ) at the 
sacrifice of orthogonality of the “selected bases”. 

For the rank-deficient systems the above solution also costs less than a Sherman- 
Morrison-Woodbury|l_2, p.50] invertion of XI + LL T with A a regularization. The two 
differs in the solving procedure in which the former costs 0(k 2 ) flops while the latter 
costs 0(k 2 n). 


5 Applications 

We exemplify two applications of the proposed diagonal pivoted Cross Approximation 
algorithm (Algorithm [ 2 ]) and pivoted Cholesky decomposition algorithm (Algorithm [5]). 

2 In this referenced equation, the An is a typo of 
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5.1 Eigen-decomposition of large matrices 

For aerodynamic robust design and uncertainty quantification the geometric uncertain¬ 
ties of aircraft are often modelled by random fields using Karhunen-Loeve expansion 
(KLE) which requires an eigen-decomposition of the underlying covariance matrix. If 
the random field has a large number of nodes the matrix is often so huge that the eigen- 
decomposition becomes prohibitive for commonly available computing resource. In [21j a 
hierarchical low rank approximation technique is used to reduce the cost. In this example 
we show how the relatively simpler diagonal pivoted Cross Approximation (Algorithm [2]) 
do the same job for KLE with continuous covariance functions. 

Consider the wing surface as shown in the left part of Figure [3] that is discretized into 
56312 mesh nodes and assumed subject to zero-mean Gaussian random perturbations. 
Due to engineering reasons the standard deviation (a) of the perturbation is assumed of 
a distribution as shown in the right part of Figure [3} The correlation of the perturbations 
on any (i,j) pair of node pk — (xk, Vk , z k) is assumed of Gaussian type: 

r(pi,Pj) = e ^ Xi ~ x ^ 2 /Ox+tyi-yj) 2 / 9 y+( Zi ~ z rf/ e2 z 

with the correlation length 6 — (0.1,0.2,0.01). 




Figure 3: Mesh of wing surface (left) and a distribution (right) 

By the above settings, a covariance matrix C of size n — 56312 is composed by 
Cij = cr(pi)cr(pj)r(pi,pj). Suppose and cp a are eigenvalues and eigenvectors of C, a pa¬ 
rameterization and approximation of the random field is given by a truncated Karhunen- 
Loeve expansion (KLE): 

k' 

K(x,y,z) « \/V ip a {x,y,z) (10) 

a=l 

where are independent standard Gaussian random variables, and k' usually much 
smaller than n. 

However, a full eigen-decomposition is usually not only very expensive but also not 
necessary, because C is often large (sized more than 25 gigabyte in our case) and rank- 
deficient due to the high degree of smoothness of the Gaussian correlation function. We 
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apply a diagonal pivoted Cross Approximation (as in Algorithm [2]) to the symmetric 
positive definite matrix C with fc=600 which yields an approximation AA T « C with 
A G M 56312x600 associated with a maximum entry-wise error £=1.05e-16. 

After this approximation the eigen-decomposition can be obtained as follows. Firstly 
make a QR decomposition A — QaRa , followed by a singular value decomposition 
(SVD) RaRa ~ UAU T . Then the diagonal of A contains the eigenvalues of AA J and 
the matrix <l> = QaU contains the eigenvectors. 

Notices the complexity of the QR decomposition and the SVD are 0{k 2 n ) and 0{k 3 ) 
respectively, much smaller than the 0{v?) complexity of a direct eigen-decomposition. 
The Cross Approximation takes about 12 seconds and the successive eigen-decomposition 
about 47 seconds on a 3.5GHz processor. I.e. the low rank approximation reduces the 
prohibitive time cost to less than one minute. 

With the obtained eigenpairs we implement the KLE in equation (10) with k' — k 
which generates a random field 1Z parameterized by 600 Gaussian variables. Imposing 
the perturbation 7Z to the direction normal to the wing surface we obtain randomly 
deformed wing geometries of which Figure [4] displays three examples. 



Figure 4: Three examples of randomly deformed wing (deformation three times exag¬ 
gerated for illustration) 


5.2 Solving radial basis functions system 

Consider approximate the following ID and 2D test functions, 

fi = (6x — 2) 2 sin(12x — 4), x G [0,1] 

/2 = (1 — x\) 2 + 100(^2 — x 2 ) 2 , x G [—1, l] 2 (Rosenbrock function) 

by the linear combination of radial basis functions (RBF) in ^ with a Gaussian type 
RBF 


4>(\\x — ®0||) = e -IIO- a:(i) )/ fl ll 2 


( 11 ) 


in which is a sampled point and 6 a shape parameter. This type of RBF is superior 
in approximating smooth functions but prone to be ill-conditioned. 

We compare the approximation based on the PCD in algorithm [3] (abbreviated as 
RBF-PCD) to that based on the regularized system © (abbreviated as RBF-Chol), 
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Figure 5: 2D Rosenbrock function (/ 2 ) 

in their accuracy and cost. The PCD is run with an error tolerance e to i = £ • £(®ij) in 
which £(a) is the maximum round-off error of operations on floating-point numbers with 
modulus a (on our machine, £(1.0) ~ 2.22e-16), and £ the number of iterations executed. 
The solution of the equation © is determined by solving the reduced system 0 and 
padding zeros to the undetermined entries. In RBF-Chol the amount of regularization 
is n • 

For the ID function fi the training points are chosen on the basis of mid-point 
rule, i.e. X = ? 2 2 ^ }• The accu racy is measured in root mean square 

error (RMSE) on 10000 test points sampled by the same rule. We first investigate the 
accuracy with the shape parameter 9 varies in [0.001,1.5] (which covers the optimal 9 
value) and with n — 50 and 100. An unregularized system ^ (abbreviated as RBF-LU) 
is also included here for comparison, which is solved by a LU decomposition since some 9 
values would lead to numerically non-positive definite matrices. This result is in figure [6] 
where the RMSE is read by the axes on the left while the k value, i.e. the number of 
utilized samples, is read by the axes on the right. 

In figure [6] we see when 9 is near its lower end (in well-conditioned zone) the three 
approximations show no difference in accuracy, as on a full rank system the PCD with 
etoi set to machine tolerance leads to the same solution as does a plain Cholesky decom¬ 
position. At larger 9 values the rank is reduced and RBF-LU displays instability while 
the other two are more stable. Notice the optimal 9 (those lead to the best accuracy) are 
all associated with rank-deficiency, though severe rank-deficiency eventually deteriorates 
the accuracy. For the RBF-Chol this deterioration is caused by the regularization which 
is multiplied by the very large w in this scenario, while for the RBF-PCD by decrease 
of utilized samples. 

Figure [7] graphs the RBF-LU and RBF-PCD approximations with 9 — 0.2 and 1.0, 
n — 50 and 100. Green dots there depict the samples utilized (included in /*) by RBF- 
PCD and red ones the rest. This figure displays vividly that the latter approach produces 
as accurate or better and stabler result by using the “key” samples only. 

To compare the error convergence and time cost along n, we first identify a series of 
optimal 9 for a series of n values, each by a fine grid search, for the regularized RBF-Chol 
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RMSE 


and RBF-PCD repsectively, and make the RBF approximations with the optimal 6 for 
each n. Figure [8] and [9] shows the convergence of RMSE along n and the associated k 
values for fi and /2 respectively. Figure [10] contrasts the time costs of the two approaches 
which are averages of 100 runs. 

It is seen the RBF-PCD has comparable or slightly better accuracy than the regu¬ 
larized RBF-Chol due to elusion of error caused by the regularization, while costs much 
less time. The computational time of RBF-PCD manifests its complexity 0(k 2 n ) which 
is nearly linear in n due to the eventually stabilized k. 



Figure 6: Error of RBF approximations of fi along 0, with n — 50 (left), 100 (right), 
and k of RBF-PCD read by the right axes 


6 Summary 

A new Pivoted Cholesky decomposition (PCD) algorithm is proposed by tuning Cross 
Approximation (CA) to symmetric positive semidefinite (SPSD) matrices. The new 
algorithm is simpler than its likes and gives a sharp bound of entry-wise error in matrix 
approximation. As a by-product a diagonally pivoted CA algorithm is developed for 
efficient low rank approximation of SPSD matrices. 

PCD is proved a valid solver of ill-conditioned systems, as an alternative to those 
resort to regularizations. It has a complexity linear in matrix size n and is much more 
efficient than regularizing solutions which are of cubic complexity. A variant of the 
algorithm provides a fast approximation to the matrix determinant. 

The efficiency advantage of the PCD algorithm is numerically manifested in two 
applications to radial basis function approximations. And the diagonally pivoted CA 
algorithm is shown greatly speed up eigen-decomposition of a large covariance matrix in 
an uncertainty quantification problem. 
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Figure 7: Stabilizing effect of PCD in RBF approximations, with n — 50 (left) and 100 
(right), 6 — 0.2 (top) and 1.0 (bottom). 




Figure 8: Error of RBF approximations of fi along n with optimal 6 (left), and the 
associated k of RBF-PCD (right) 
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Error of RBF approximations of fa along 
k of RBF-PCD (right) 


n with 


n 

optimal 6 


(left), and the 




Figure 10: Time cost of RBF approximation of fa (left) and fa (right) with optimal 6 
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